Reinforcement Learning for Humanoid Robotics

نویسندگان

Jan Peters

Sethu Vijayakumar

Stefan Schaal

چکیده

Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot [3]. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade [10] is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Episodic Reinforcement Learning Control Approach for Biped Walking

This paper presents a hybrid dynamic control approach to the realisation of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part inclu...

متن کامل

Robo-Erectus: a low-cost autonomous humanoid soccer robot

The humanoid soccer robot league is a new international initiative to foster robotics and AI technologies using soccer games [1]. This paper provides a brief description of a low-cost autonomous humanoid soccer robot called Robo-Erectus (RE), which has been developed in the Center for Advanced Robotics and Intelligent Control (ARICC) at Singapore Polytechnic since 2001. To develop a low-cost hu...

متن کامل

Policy Gradient Methods for Robot Control

Reinforcement learning offers the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be co...

متن کامل

Reinforcement learning control algorithm for humanoid robot walking

The integrated dynamic control of humanoid locomotion mechanisms based on the spatial dynamic model of humanoid mechanism is presented in this paper. The control scheme was synthesized using the centralized model with proposed structure of dynamic controller that involves two feedback loops: position-velocity feedback of the robotic mechanism joints and reinforcement learning feedback around Ze...

متن کامل

Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks

This paper presents a novel approach for acquiring dynamic whole-body movements on humanoid robots focused on learning a control policy for the center of mass (CoM). In our approach, we combine both a model-based CoM controller and a model-free reinforcement learning (RL) method to acquire dynamic whole-body movements in humanoid robots. (i) To cope with high dimensionality, we use a model-base...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Reinforcement Learning for Humanoid Robotics

نویسندگان

چکیده

منابع مشابه

Episodic Reinforcement Learning Control Approach for Biped Walking

Robo-Erectus: a low-cost autonomous humanoid soccer robot

Policy Gradient Methods for Robot Control

Reinforcement learning control algorithm for humanoid robot walking

Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks

عنوان ژورنال:

اشتراک گذاری